AITopics | testing llm

Collaborating Authors

testing llm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Turing: Testing LLMs for Intelligence

Communications of the ACMAug-12-2024, 14:50:11 GMT

In the nearly two years since its release, ChatGPT has shown some remarkably human-like behavior, from trying to seduce a journalist to acing the bar exam. That has left some people wondering whether computers are approaching human levels of intelligence. Most computer scientists do not think machines are the intellectual equals of people yet, but they have not developed a consensus on how to measure intelligence, or what exactly to measure. The canonical experiment to check for machine intelligence is the Turing test, proposed by Alan Turing in his 1950 paper "Computing Machinery and Intelligence." Turing argues that if a computer could convince a person having a typed conversation with it that it was human, that might be a sign of intelligence.

intelligence, testing llm, turing test, (8 more...)

Communications of the ACM

Country: North America > United States > California > San Diego County > San Diego (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Preference Alignment

Chen, Yongchao, Arkin, Jacob, Hao, Yilun, Zhang, Yang, Roy, Nicholas, Fan, Chuchu

arXiv.org Artificial IntelligenceFeb-13-2024

Prompt optimization aims to find the best prompt to a large language model (LLM) for a given task. LLMs have been successfully used to help find and improve prompt candidates for single-step tasks. However, realistic tasks for agents are multi-step and introduce new challenges: (1) Prompt content is likely to be more extensive and complex, making it more difficult for LLMs to analyze errors, (2) the impact of an individual step is difficult to evaluate, and (3) different people may have varied preferences about task execution. While humans struggle to optimize prompts, they are good at providing feedback about LLM outputs; we therefore introduce a new LLM-driven discrete prompt optimization framework that incorporates human-designed feedback rules about potential errors to automatically offer direct suggestions for improvement. Our framework is stylized as a genetic algorithm in which an LLM generates new candidate prompts from a parent prompt and its associated feedback; we use a learned heuristic function that predicts prompt performance to efficiently sample from these candidates. This approach significantly outperforms both human-engineered prompts and several other prompt optimization methods across eight representative multi-step tasks (an average 27.7% and 28.2% improvement to current best methods on GPT-3.5 and GPT-4, respectively). We further show that the score function for tasks can be modified to better align with individual preferences. We believe our work can serve as a benchmark for automatic prompt optimization for LLM-driven multi-step tasks. Datasets and Codes are available at https://github.com/yongchao98/PROMST. Project Page is available at https://yongchao98.github.io/MIT-REALM-PROMST.

agent, integrating human feedback, prompt optimization, (12 more...)

arXiv.org Artificial Intelligence

2402.08702

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Transportation (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback